These are the packages we need for this project.
library(tidyverse)
library(raster)
library(sf)
library(leaflet)
library(hash)
library(rgeos)
library(spatstat)
To see vegetation in our Aarhus area, we will need a satellite image with the red and near-infrared color bands, so we can calculate an NDVI.
# Define location for Aarhus for repeated leaflet drawing
aarhus_lng <- 10.2
aarhus_lat <- 56.15
aarhus_zoom <- 11
# EPSG 3857
aarhus_NIR <- raster('data/32VNH_0_B08,B04,B03.tiff', band = 1)
aarhus_red <- raster('data/32VNH_0_B08,B04,B03.tiff', band = 2)
NDVI_numerator <- aarhus_NIR - aarhus_red
NDVI_denominator <- aarhus_NIR + aarhus_red
aarhus_NDVI <- NDVI_numerator / NDVI_denominator
# Let's check it out
plot(aarhus_NDVI)
Unfortunately there were a lot of algae (or something like it) in the harbor when the image was taken but otherwise it looks great! NDVI is not a perfect indicator for what we define as a ‘green area’, but it is close enough for this proof-of-concept.
We will need to define a fitting NDVI threshold for our definition of a ‘green area’ so not every field and bush is displayed alongside parks and forests. To do this, we found inspiration here.
# Reclassify to 'no vegetation' and 'low vegetation' (threshold at NDVI == 0.2)
m_low <- matrix(c(-Inf, 0.2, NA, # no veg
0.2, Inf, 1), # low veg
ncol=3, byrow=TRUE)
# Reclassify to 'no vegetation' and 'high vegetation' (threshold at NDVI == 0.5)
m_high <- matrix(c(-Inf, 0.5, NA, # no veg
0.5, Inf, 1), # high veg
ncol=3, byrow=TRUE)
# Reclassify to 'no vegetation' and 'x-high vegetation' (threshold at NDVI == 0.6)
m_xhigh <- matrix(c(-Inf, 0.6, NA, # no veg
0.6, Inf, 1), # x-high veg
ncol=3, byrow=TRUE)
aarhus_veg_low <- reclassify(aarhus_NDVI, m_low)
aarhus_veg_high <- reclassify(aarhus_NDVI, m_high)
aarhus_veg_xhigh <- reclassify(aarhus_NDVI, m_xhigh)
# Show on map so we can validate a good NDVI threshold and minimum area for green areas
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addRasterImage(aarhus_veg_low, group = 'Low threshold') %>%
addRasterImage(aarhus_veg_high, group = 'High threshold') %>%
addRasterImage(aarhus_veg_xhigh, group = 'Extra high threshold') %>%
setView(lng = aarhus_lng, lat = aarhus_lat, zoom = aarhus_zoom) %>%
addLayersControl(
overlayGroups = c("Low threshold", "High threshold", "Extra high threshold"),
options = layersControlOptions(collapsed = FALSE)
) %>% hideGroup(c("Low threshold", "High threshold"))
An NDVI threshold of 0.2 is clearly too low - most areas bleed together and we get a lot of fields, that are not really what we would characterize as a ‘green area’ in the sense that you would want to go for a walk in a field.
An NDVI threshold of 0.5 catches most of the parks and forests, which is pretty ideal, but we still get a lot of areas that are just grassy or fields.
An NDVI threshold of 0.6 seems to match pretty well with our definition of a ‘green area’ - we lose a bit of our wanted areas, but we also eliminate most of the gap-jumping (areas that span across roads etc.) and areas that do not have at least some taller vegetation than grass - a fair trade off.
It’s important to notice, that we are still working with an inaccurate CRS (EPSG 3857) for measuring areas and distances in Denmark, since reprojecting the raster earlier would have warped the pixel values, mutating our NDVI-raster to include impossible values. Since we no longer need the pixel values, but rather the distances and areas, we can now reproject the raster into EPSG 25832.
# Rename the best raster for ease of use
aarhus_veg <- aarhus_veg_xhigh
# Reproject raster since EPSG 3857 distances are very warped this far north
aarhus_veg <- projectRaster(aarhus_veg, crs = 25832) # 25832 has very accurate distances in Denmark - we measured
# Let's verify that the pixels are now correct
res(aarhus_veg)
## [1] 20.1 20.0
The Sentinel-2 L2A satellite should theoretically give us images where every pixel corresponds to 20m^2, so this is very acceptable for our purposes, since we are mostly interested in relative distances in the coming heatmap, and our method for calculating the ‘green area’ area of the region of interest is not precise enough to begin with to worry too much about a 10cm discrepancy on the x-axis of every pixel.
w From the leaflet above, we can also determine that more than about 10 square pixels (~200^2) looks to be a good cutoff size for a green area that one would visit on a stroll - it is right about the size when a small patch of trees become a (in our opinion) ‘green area’. Let’s remove the small areas.
# Number of pixels^2 above which we allow an area to be kept in our map
area_min_pix <- 10 # (~200m^2)
# Dictionary to keep track of how many pixels belong to every continuous clump of pixels (distinct areas)
id_counts <- hash()
# Give every pixel an ID corresponding to which green area it is a part of
area_map <- clump(aarhus_veg, direction = 4) # only use rook's case, since areas are not often connected diagonally
# Select only the values we need
area_ids <- area_map@data@values
# Count how many pixels there are of every ID
for (id in area_ids) {
id <- toString(id)
if (!has.key(id, id_counts)) {
id_counts[[id]] <- 1
} else {
id_counts[[id]] <- id_counts[[id]] + 1
}
}
# Remove NA from the dictionary, we will only keep the keys of the IDs to remove from the raster
del("NA", id_counts)
# Since the number of pixels with a given ID directly corresponds
# to the area of that clump, we can use the pixel count to filter by size
# Remove the IDs of big areas from dictionary of area IDs
for (key in keys(id_counts)) {
# Area size corresponds to pixel counts of clump with the given ID
area_size <- id_counts[[key]]
# Remove big areas from this dictionary
if (area_size > area_min_pix) {
del(key, id_counts)
}
}
# Remove pixels belonging to small clumps from original raster
for (i in 1:length(area_ids)) {
# Select the green area ID of the pixel
val <- toString(area_ids[[i]])
# Whether to remove this pixel (i.e. is this ID in the dict of pixel IDs to remove)
remove_pixel <- has.key(val, id_counts)
# Set the corresponding pixel in the original raster to NA
# since it is part of small area
if (remove_pixel) {
aarhus_veg@data@values[[i]] <- NA
}
}
# See if the pixel clumps at and below 10 pixels are gone
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addRasterImage(aarhus_veg_xhigh, group = 'Before') %>%
addRasterImage(aarhus_veg, group = 'After') %>%
setView(lng = aarhus_lng, lat = aarhus_lat, zoom = aarhus_zoom) %>%
addLayersControl(
overlayGroups = c("Before", "After"),
options = layersControlOptions(collapsed = FALSE)
) %>% hideGroup("Before")
Small patches of trees (etc.) are now gone, so we can sleep soundly knowing that our application will not deceive people into thinking they are living near a park when in reality they just have a tree in their back yard.
We are now ready to generate a new raster heatmap, where every single location (in our defined area) is denoted with the distance to the nearest ‘green area’ by its color.
# Copy the green areas raster so we don't mutate the original
aarhus_dist_heatmap <- aarhus_veg
# Reclassify ones to zeroes
# every 1 symbolizes a green area, but we now need it to symbolize the distance to the nearest green area (which is 0)
m <- matrix(c(1, 0),
ncol=2, byrow=TRUE)
aarhus_dist_heatmap <- reclassify(aarhus_dist_heatmap, m)
# Fill in every transparent pixel (NA) with the value of the distance to nearest green pixel
aarhus_dist_heatmap <- distance(aarhus_dist_heatmap)
# Define a fitting color gradient for drawing distances
veg_colors <- colorRampPalette(c("darkgreen", "lightblue"))
# Check plot for the highest distance found only on land
# so we can remove very high values only found at sea, since these stretch the legend significantly
plot(aarhus_dist_heatmap, col = veg_colors(20))
Our new heatmap has a few very high distance values, mostly because the large sections of water obviously are not very close to vegetation (except for some annoying edge cases). Looking at the heatmap with a small number of color steps allows us to get a rough idea of which distances are the maximum observed on land, which is the interesting part for us. Using a color picker, we can tell that values on land mostly go to ~1km.
To combat the problem of our color range and legend being stretched (which makes it hard to distinguish smaller distance values), we can simply remove all distance values that we observe to only be present at sea. This makes our heatmap more readable. We have fine tuned this maximum allowed distance to 1200 meters by trial and error.
# Color picker says ~1km to green area is max on land, so we set it to 1.2km to be safe
max_dist <- 1200 # meters
# Reclassify longer distances to not stretch the color gradient as much
m <- matrix(c(max_dist, Inf, NA), # everything above max_dist is set to NA
ncol=3, byrow=TRUE)
aarhus_dist_heatmap <- reclassify(aarhus_dist_heatmap, m)
# Defining a color palette with more steps to work on leaflet legend
palette <- colorNumeric(
veg_colors(100),
raster::values(aarhus_dist_heatmap),
na.color = "transparent"
)
# See that the heatmap works and lines up with green areas and the shore
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addRasterImage(
aarhus_dist_heatmap,
group = "Heatmap",
colors = veg_colors(100)
) %>%
setView(lng = aarhus_lng, lat = aarhus_lat, zoom = aarhus_zoom) %>%
addLegend(
"bottomright",
pal = palette,
values = raster::values(aarhus_dist_heatmap),
title = "Dist. (m.)",
group = "Heatmap",
opacity = 1
) %>%
addLayersControl(
overlayGroups = c("Heatmap"),
options = layersControlOptions(collapsed = FALSE)
)
Looking at the heatmap overlaid on a satellite image of Aarhus, we can verify that our maximum allowed distance lines up pretty well with the shoreline, which makes our color range as informative as possible without losing any important data.
To make it a bit easier to see where you are when in heatmap mode, let’s load in some roads.
roads_4326 <- st_read('data/osm-roads/gis_osm_roads_free_1.shp')
# Transform roads crs to match raster
roads_25832 <- st_transform(roads_4326 , crs = crs(aarhus_veg))
# Verify that roads can be cropped to the raster's bounds correctly
compareCRS(roads_25832, crs(aarhus_veg))
# Bounding box of satellite image
aarhus_box <- st_make_grid(aarhus_veg, n = 1)
# Cropping the roads, so we don't have superfluous data to load and display
roads_cropped <- st_intersection(aarhus_box, roads_25832)
Finally, we can display a leaflet to help you determine the distance to the nearest ‘green area’ to go for a nice walk in, wherever you are (inside of our defined Aarhus area).
The leaflet is comprised of the following parts: - A base map to make it easier finding a(n) point / area of interest - A heatmap with colors indicating distances to the nearest ‘green area’ (as the crow flies) - A map of only the ‘green areas’ to make it easier distinguishing between the values of e.g. 1 meter to a ‘green area’ and the ‘green areas’ themselves - Roads to make it easier navigating while in ‘heatmap mode’ - A legend that maps color values to distance values in meters - A toggle control to change between the base map and the heatmap
leaflet() %>%
# For finding a specific location easier
addProviderTiles("OpenStreetMap") %>%
# The heatmap
addRasterImage(
aarhus_dist_heatmap,
group = "Heatmap",
colors = veg_colors(100)
) %>%
# The green areas highlighted
addRasterImage(
aarhus_veg,
group = "Heatmap",
colors = "#36453B"
) %>%
# Roads make it easier to understand locations when in heatmap mode
addPolylines(
data = st_transform(roads_cropped, crs = 4326),
weight = 3,
opacity = 0.3,
color = "#fff",
group = "Heatmap") %>%
# Shows colors for the distances
addLegend(
"bottomright",
pal = palette,
values = raster::values(aarhus_dist_heatmap),
title = "Dist. (m.)",
group = "Heatmap",
opacity = 1
) %>%
# Look at Aarhus
setView(lng = aarhus_lng, lat = aarhus_lat, zoom = aarhus_zoom) %>%
# Toggle heatmap mode
addLayersControl(
overlayGroups = c("Heatmap"),
options = layersControlOptions(collapsed = FALSE)
) %>% hideGroup("Heatmap")
You can now find the best place to buy a house if you want to live close to a park! Or maybe you just like to look at applied data - don’t we all.
Knowing how plants and trees reproduce and the effects of urbanisation on vegetation, we would expect to see ‘green areas’ being more closely located some places than other. Is this the case for our defined Aarhus area?
# Clump together pixels to make a polygon for every area
aarhus_veg_clump <- clump(aarhus_veg, direction = 4)
aarhus_veg_poly <- rasterToPolygons(aarhus_veg_clump, dissolve = TRUE)
# Convert to sf to use sf methods
aarhus_veg_poly <- st_as_sf(aarhus_veg_poly)
# Calulate centroids to get points to analyse
aarhus_veg_centroids <- st_centroid(aarhus_veg_poly)
# Generate window necessary for ppp conversion
win_xrange <- extent(aarhus_veg_centroids)[1:2]
win_yrange <- extent(aarhus_veg_centroids)[3:4]
centroid_owin <- owin(xrange = win_xrange, yrange = win_yrange)
# Separate coordinates into separate columns, which is necessary for ppp conversion
aarhus_veg_centroids <- aarhus_veg_centroids %>%
mutate(x = unlist(map(aarhus_veg_centroids$geometry, 1)),
y = unlist(map(aarhus_veg_centroids$geometry, 2)))
# Convert centroids to ppp
aarhus_veg_centroids <- ppp(x = aarhus_veg_centroids$x, y = aarhus_veg_centroids$y, centroid_owin)
# Perform quadrat test to see if green areas are clustered
quad_test <- quadrat.test(aarhus_veg_centroids, nx = 10, ny = 10)
quad_test
##
## Chi-squared test of CSR using quadrat counts
##
## data: aarhus_veg_centroids
## X2 = 605.03, df = 99, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 10 by 10 grid of tiles
plot(quad_test) # easier to read in fullscreen
The quadrat test will give us a basic understanding of whether the location of ‘green areas’ differ from a totally random placement across our focus area. Looking at the returned χ^2 value, we can see that there is indeed a difference from a totally random spatial distribution of ‘green areas’, and with a p-value of ~0.00000000000000022, we can definitely say that this is an unlikely result to get from a random distribution, which seems to confirm our hypothesis, that ‘green areas’ are unevenly distributed in our region of interest.
However, when we look at the plot of the quadrat test, we can tell that a lot of this difference stems from the sea area to the east, in which we have (nearly) no ‘green areas’. For example, when looking at most of the central easternmost cells (those corresponding with the sea area), we can see that we expect to see ~14.5 green areas, but only find 0 or 1, which pretty heavily skews our results, since we are only interested in the spatial distribution of ‘green areas’ on land.
To find this value, we simply have to divide the total area of ‘green areas’ by the total land area of our region of interest. These areas can be calculated with the amount of ‘green’ pixels (0 meter distance values) in our heatmap and every other distance value pixel that is not NA (to approximate only land area, since NA somewhat represents sea area).
# Init counters for total area that is not NA and areas of 'green' pixels
green_area <- 0
total_area <- 0
# Counts total pixels and 'green' pixels
for (pix_val in aarhus_dist_heatmap@data@values) {
if (!is.na(pix_val)) {
total_area <- total_area + 1
if (pix_val == 0) {
green_area <- green_area + 1
}
}
}
# Output the fraction of total area that is 'green'
green_area / total_area
## [1] 0.2297205
About 23% of the total area of our (imprecisely) defined Aarhus area is covered in what we have chosen to define as ‘green areas’.
To find out what these areas are in non-pixel units, we simply multiply the pixel count by our pixel size (in meters) to get the m² area, and for easier readability, we can convert m² to hectares by multiplying again with 10⁴.
# Output areas in hectares
pixel_size <- res(aarhus_veg)
green_area_hectares <- green_area * pixel_size[1] * pixel_size[2] / 10^4
total_area_hectares <- total_area * pixel_size[1] * pixel_size[2] / 10^4
green_area_hectares
## [1] 10115.57
total_area_hectares
## [1] 44034.24
That is ~10,100 hectares of ‘green area’ out of a total of ~44,000 hectares of land in our region of interest!